Goto

Collaborating Authors

 cross-component bonus


e61eaa38aed621dd776d0e67cfeee366-AuthorFeedback.pdf

Neural Information Processing Systems

This relationship is obvious if the transition and reward factorizations are the same, namely X[Ii] = X[Ji]5 for all i [m], in which case the FMDP has m independent components. The remarkable aspect here is that such6 a relationship holds, even if the transition and reward factorizations differ arbitrarily. To summarize the insight, in the long run, different growth rates of the counters reflect different importance of the23 components towards maximizing cumulative rewards, and early on, their growth can suffer large variance. Intuition: please see our Response 2.1 for an intuitive explanation regarding why we need the37 cross-component bonuses. Moreover, these cross-component bonuses offer new insight (see our Response 2.1).